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In Pasch terminology, two or more tests are 
when they are joined into a peel related to a single scale, 
same ite» calibrates at a. low level on one test and a high 
second test, it indicate^ that the i'^m is low cn the first test 
because the ether items-on that te^t are higher, and is high cn 
second test k€cai,se the other items on that test are lower. The 
difference between, the average of the calibrations of a subset of 
items on one test and the average for the same items on a 
is called th'e (iinklnq value. The accuracy of a linking value can be 
establislied by confirmation through the use of a third test, an 
alqebraic process known as trlanqulat ion. The pattern of links 
between pairs^ef field tests Is called a linking network. The purpose 
for the network is to Insure that sufficient information will be 
available to calculate accurate Itnkinq values for each test to the 
final scale* Four types of linkinq networks can be used: the four 
square network, the three-by-three .network, the double eight linkinq 
retwork , and the octagon linkinq network. (Ruthor/BW) 
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LINKING GROUPS OF ITEMS 

Fred Forster 
° George Ingebo 
Portland Public Schools 

, General Purpose 

-linking la the art of making big ones out of little ones, item^pools 
tajat.is; The Rasch calibration of a single test 'relates the items to an 
equal interval scale centered on that test. The power of the Rasch nodel 
is realized when two or more tests are joined into a pool related to a 
single scale— in Rasch terminology "linked" to each otlier. Since every 
student can't take every item it is necessary to find atf alternative 
method for tying separate test administrations together. The folloxfing 
guidelines are intended to establish some principles for linking items 
and tests in a variety of practical situations. 

Rationale 

When the same item calibrates at a low level on one test and a high 
level on a second test, it provides information about the relative cali- 
brations for the rest of the items on the two tests. The item is low on 
the .firsts test because the ather items on that test are higher, anH is high 
on the second test because the other items on that test are lower. For 
example, as shown in Figure 1, item X calibrated as 190 on test A and 
210 on test B . Since the difference in these calibration values is due 
to the level of the other' items relative to item X, the twenty point 
difference is an estimate of how much the calibrations on one test would 
have to be increased or decreased to bring them into line with the other 
test. As expected, adjusting test A to test B (A — >B) involves the 
addition of 20 points to the calibration of each item on test A. In this 
case, an item with a calibration of 230 on test A would be assigned a 
calibration of 250 in relation to test B items, i.e., would be expected to 
calibrate at 250 if included on test B. 

Obviously, one item is sufficient to link two tests if it is calibrated 
at the "right" value on both tests. In practice, the way to insure the 
stability of the linking value between -two tests is to Include a small 
subset of items on 'both tests. The number of .items in this subset is 
one of several factors which are considered in planning efficient linking 
patterns. 



*Calibtations-are expressed in Rasch units (RITS) . 
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Flgure 1 

The Same Item' Calibrated on Two Different Tests 
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^ * /"^ ^ Linking Values 

A 'linking valutt is the difference lietween the average ol the callbra- 
tio\ia''^f * auba*t of items on one teat and the average for the same items 
on a/second test ; This value can be' 'either positive or negdtive depending 
on the levels of ^he tests being linked. As shown in Figure 2, the dif- 
ference between the ^averages is '+20 points if test A is- adjusted to the 
scale estoblished by test B .(oi -20 points if test B is adjusted to test A). 

Linking values are more dependable as the number of items included 
in the su*bset'or "link" is increased, but the size of the link does not 
guarantee accuracy. Since it is necessary to use a large number of fests 
to create an item pool, rslng a test length of forty items is about right 
to encourage..teachers to volunteer testing time. Therefore, while an increase 
in the length of link increases the reliabUity of the linking value, it 
also significantly decreases the number of items being linked together. 
Another fattor affecting', the size of a link Is attrition. The Rasch 
' program produces several indices of item quality which indicate the items 
that aid not perform" satisfactorily during field testing. Almost certainly 
link items wilt be lost, in both the inital Rasch analysis and during the 
Unking procedure. Therefore, more items need tp be Included in a link when . 
field testing pteviously untried items, 'than when, for example, developing 
parallel forma of an established test. 

. V O , - - • — 

Triangula tion 

The accuracy of a linking value is best established by .confirmation 
through the use .of H third test. As shown in Figure 3, adding the linking 
values for A^C and C— ^ should algebraically sum (approximately) to 
the linking valu* i^->B. In Chis example the sum of A — >C and C— >5 is 
called a conflnhing valW fi >g the direct value A — >B. 

The importance of" triangulation is that it provides the capability 
of identifying and excluding counter-productive information from a linking 
value. In those cases where a close agreement is not obtained between the 
direct value aM the confirming valued for a link, some detective work is 
.indicated and it may be necessary to reexamine the original Rasch scaling, 
%he item analysis, or the linking analysis before the reason for- the 
problem is uncbvered. 



Figure 2 

Calibrations for a Subset of Items on Two Different 'Tests 

Calibration* Calibration 

. It« Test A Test B ' 

X 190 210 

Y 187 , ' 206 • 

Z ' 214 235 



Average 197 217 



* Calibrations are shown in Rasch units (KITS) 



Fl|ure 3^ 
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Example of the Triangula cion of Linking Values 

. Linking value aduscihg Test A calibrations 
to fit in Test B • +20 
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It Is ftaportant to note that th^ phlloaoph'v behind trlanttulatiorf Implies 
that the discrepancies In linking values are caused factors which should 
not be iCTored . The linking; network serves as .a tool for Identifying the 
dl8crepan9.1es and the etrors which caused them so that' appropriate cor-, 
receive actiojn can btf taken to remove the faulty data. This philosophy -* 
. stands in contrast to 8.ome current linking procedures which average the 
discrepancies among links without aay attempt ^ identify and eliminate 
possible sources of error. *^ ^ 

Linking Procedures 

The previous sections have described how linking establishes the Item 
relationships which would exist if all items in a pool were calibrated in 
a single testing operation, and how confirming through trlangulatlon 
validates the accuracy of the linking procedure. This section will de- 
scribe a strategy for efficient linking. First,. there will be an examina^ 
tion of the fundamental Issues involved' in constructing an item bank and 
field testing. Then there will be a consideration of the way liy which 
several specific llnkitog patterns provide the information needed to develop 
the scale for items being field tested. - 

Curriculum Considerations • ' ' * 

Since curriculum Integrity is essential to an item bank, it 1* neces- 
sary to carefully specify goals feo which a bank relates and to precisely 
tie each item to these goals. In devfeloplng item banks in reading, mathe- 
matics and language usage, the Northwest fivaluatloii Association (NWEA) has* 
used the comprehenslAjfje Trl-County Course Goal Collections f6r this purpose. 
(While NWEA efforts have proven effective in these subjects based on rules 
and conventions. Item badclng in other fields such as science and social studies 
where relationships are discovered rather than decreed remains to be proven.) 

Curriculum balance is Important in the design of field tests to 
make the exeirclse reasonable for teachers and student^. A forty item field 
test which relates to the learning experiences of students can leave a 
positive reaction with a class, while a tS$t which has too many items on 
the same goal or which is Irrelevant to the goals of classroom instruction 
c^n t)e repetitious and boring.' Curriculum decisions concerning the scope 
of content to be Included in t6st and the number o^ items to represent each 
goal will affect, the test format, test length, the number of tests needed, 
the composition^ bf the links between tests, and the target population for 



field testing* After the, bank has been established and permanent tests are 

beihg bUftit, decisions about the scope and sequence of content* the repre<> 

sentatilon given different goals and the information needed from the test 

r * ' . 

will help insure the success of-^he resulting testing program. 

Administration Consi^deratlons ' .. 

In addition to- curriculum issues, it is important to consider a 
variety of aiimlnistrative issues, especially as they affect the mental 
set of the teacher and students and, ultimately, the completeness and 
accuracy of the test Information. ^ 

First, it is- essential that the patticipation of .principal^ and 
teachers in field testing be voluntary. We. have. found that one way to 
encourage volunteer participation is to return a report to the teather 
on a rapid turnaround schedule which is designed for classroom use; In. 
addition to sparking interest, this policy increases the teacher ' s atten- 
tion'to the completeness and accuracy of Information turned in for her 

class. *.'.*" 

Second, the teacher volunteers need to be informed about the experi- 
mental nature of the tests, and that many of the teat questions are new 
and untried. If they are encV^raged. to suggest improvements to the 
questions, they don't like, an additional valuable source of information 
• will be available for the item analysiis. 

Finally, it id important to use a test* length for- field tesning 
.which does not present difficulties for the teacher and, -therefore, 
adversely affec her mental set and that of the students. After several-^ 
years of experience, we have found that forty multiple choice items is 
about the maximum test length which does not, present scheduling problems 
or introduce a speedness factor. While the maximum test length may differ 
fxom sub^^ct to subject, it is important to give' the student sufficient time 
needed to complete the test if he knows the subject matter. 

'When these measures are followed, they lead to a better quality of 
test information and also contribute to the willingness of the teachers 
(as well is' those with whom they discuss the program) to cooperate in 
future field testing. 

Linking Networks s 

' - The pattern 'of item links (subsets) between pairs of field tests 
which is used to establish an item pool or expand an item bank is called a 



linking network . The purpose for^the network Is to insure that sufficient 
Information will be available to calculate accurate ' linking values for • 
each. test to the final scale. thereby fixing the level of each Item ttiat 
has been field tested, v Further consideration of this g(^l leads to a ' * 
few general prlncl);>les of njetwork design. 

First. It 1*8 usually not a good Idea to. use extremely large links * 
Vftiile It is true. (hat linking values become .more dependable as the size 
o£ the link Is increased, a length' of forty Items is; usually the maximum 
field test size which can be accommodated In normaf^admitd-stratlon. To.- 
gether with t^te .fact that the length of a link is only on^ factor 
affe/:tlng the accuracy df the linking value, thlb leads to the conclusion 

that using a link size beyond ten items usually unnecessarily reduces the ' 

. ' • * » 

item yield from field testing. . ' ' 

Second, it is necessary to assess the previous information available 

for your items' . In "addition to the number of items to be field tested and ^ 

the existence of an established prior bapk, two esserftial variables whi'ch / 

affect a network are: (1) the number gf indepesdeuc calibrations needed* ' 

for each item, andtt(2) the nunSber of confirming values needed for each 

lin&. Fozf example, in establishing an item pool with previously untried , 

• ' ' ' . * 

items each item sh6uld be calibrated at least three times and each link 

should be confirmed' at least four times. On the other hand, if the items 

have been used and Item analyzed before under good test admlnlatratlon - 

condltlon^t then It might be reasonable to calibrate each Item^only twice 

and confirm each link twice > 

Third.' if the same link is used in' all three tests involved In a 
triangulation. then the confirming value is of no use (see Appendix' A)^ 

Fourth, research on field testing has provided some rules of thumb 
for developing the linking network . Forster, et al. (1978) have shown 
thatr \ ' . ' • * ' 

(1) it is not necessary tio use' random samples to obtain accurate 
item calibrations »^ ' « 

(2) '200 or more studenls are sufficient to obtain accurate Item, 

calibrations t 

(3) item linking values conform t6 an equal Interval scale^ and 

(4) Item calibrations are not biased by levels of the other items 
included in the field test. 

The following section describes the use of these guidelines In the desig 



*Forster, P., Ingebo, G.I. and Wolmut, P. The Rasch Model Monograph Series . 
The Northwest, Evaluation Association, December 1978. 
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of linking networks related to situations actually p.ncountered in building . 
or expanding an iteof ^ank. 

' ' ' ^ Examples of Linking Networks 



The'*Fouraquare Netaork 

Th^ first net^rk discussed *here was used to establish the NWEA language 
4rts item bank from a large number, of previously l!htried items. As shown ii^ 
•Figure 4, the. Foursquare network is based on the principle of calltlurattng ' • 
each item three times to produce two direct and four confirming values 
for each link, s'hich make* it ideal for 4eterming and excluding questioriable data 
ih building .the bank* By numbering each block of ten items, the design 
for the fJ^^eld tests has been shown in Figure 5. The pattern includes 88 
blocks of ten items each (880 items) , arranged in 64 tests requiring approx- 
imately 13,000 Students, givin? a yield of 13.75 items per test. 

As shown in Figute 6, this linking network derives its name from the 
fact that each group of four tests can be analyzed to give a pool of 80 
itfems. Each group of four 80 item pools can then be analyzed €o give a 
240 item pool. The total of four 240 item pools can then be anaj.yzed to 
give a single 880 item pool. Thus, there are three successive "phases" 
of the" analysis leading to the final item pool . 

In analyzing each of the links between two tests ^gay LAI arid LAS) 
the following triangulations can be made: , 





This configuration tken prov*ldes two direct and four confirming values for 

« 

each links Once tests LAI through LA4 are combined Into a single pool. 



Figure 4 

Phase I of the Foursquare Linking Network 




Figure 5 
The Foursqtiiare Linking Pattern 
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Figura 6 

Phase II of the Foursquare Linking Network 
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indicates item blocks which may "be used to link the network to an Item bank o 
another network. 



blocks 65, 66, and 67 link to the four adjacent pools (each foraed by a 
group of four tests) through a comparable analysis. 

The redundancy 'Of direct links In this neci^rk strengthens Che design 
and simplifies the Identlflcaxlon of bad 'test Information In any given ^ 
test. Since organizing a field testing effort of this magnitude. may leave 
little tine available for readoinistration of a test shown to brj weak, 
the redundancy provides a second direct value if one turns out to be 
unacceptable. If both direct valuos are unacceptable, there are still six 
more confirming values to salvage the link. When these failsafe features 
of the network arft not needed, the two 10 item links between each p»ir of 
tests are combined to form iiix strong 20 item links among the four tests. 
Another feature of this network is that when sixteen or more tests are 
used it can acconmodate a wide range of item levels. (In tha case of the 
NWEA language arts bank, the 64 tests spanned grades three through eight.) 

Theoretically there is no upper limit to the number of items which 
can 'be calibrated using this design. In addition, it can be modified to 
produce "unbalanced" networks so that a group of four tests can easily be 
linked to a group of sixteen or sixty- four. 

- The Three-by-Three Network 

The Three-by-Three network is used when lioth new untried Items and 
some established items are available for forming a pool or linking to an 
existing item bank. As shown in Figure 7, this network accommodates 
ten test3e Aasuming that test A Is th.e lowest level test and test J Is 
the highest^ G&re must be taken In selecting the items for litik 12 since 
it is included in both of these tests* Therefore, one of the considerations 
in using this design is a determination of the appropriateness from a 
— ^rriculua-ferspectiva ^f using^^e same items in the highest- and losrest 

tests. (Normally, this constraint does not seriously limit a linking 

» 

itffort since the three bank links provide solid confirming values for this 
tenuous link.) 

Using this design each new item is calibrated three times and each 
linking value is confirmed four or more times providing a yield of 120 
new items or 12 items per test* ^ 

16 
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Figure 7 

The Three-by-Three Linl^ng Pattern and Linking Netw 
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indicates established or bank items. 



Th« Doublt Eight Linking Network ' ' 

The Double Eight network provides a flexible structure for accommodating 
mixes o£ proven and untried items As- shown in Figure 8, this network uses 
sixteen tests arranged in ascending level (from test A to test P) with four 
links to an existing bank.- Each item is calibrated twice and each link 
haa one direct and two confirming values. The sixteen tests link 360 l;ems 
for: a yield of 22.5 items per test. 

One of the advantages of this design l;s its flexibility. In ^hose 
situations where it is necessary to strengtheu the pool by more tightly 
linking it an existing bank, four additional bank links can be included. 
At the opposite extreme, if the timeline for field testing can be extended 
to accoiaaodate readmlnlsterlng one or more forkis, or nearly all the items 
wUl ,be Included in a subiriquent extensive testing program for recalibrat«.on, 
the links can be cut to eight items and a significantly higher yield will 
result. A third variation would be to repeat the design seviral times 
along paraUel linking paths f rot* a low level to a high level which would 
accovnaodate a wide range of curriculum scope and sequence and still provide 
adequattt monitoring of the overall system. 

The Octagon Linking Network 

A shown in Figure 9, the Octagon network uses eight tests which are 
linked to an existing item bank and is appropriate when the field test 
length can be increased to 45 items. (The pattern uses five blocks of . 
nine items for each test.) Each item is calibrated three times and there are 
one direct and four confirming values for each link. The network links 
108 items (12 blocks of nine) to the bank for a yield" of 13.5 items per 
test. 

A useful variation on this design is to, repeat the Octagon design four 
jtmmm as sho wn in Figur e HO. and t o use the "Bank" links to tie the four 
networks together.' (This results in a structure quite similar to the Four- 
square pattern at the center of the four octagons.) The resulting network 
can be analyzed in two phases, first to create pools from each octagon 
of eight tests, and then to link the four pools with two direct and six 
confirming values f or^ the links between pools . 
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Flgure 8 

The Double Eight Linking Network and Linking Pattern 
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K ' • * , 

I 

This paper has presented .« vatlonale for linking tests td (orm item 
pools using Rasch calibration techniques and liok triangulation. In 
addition, these principles ha^' been demonstrated in the development of 
linking networks designed to meet different field test situations. The 
philosophy behind the, development of these procedures is the desire to 
Identify and eliminate faulty field test information, ^n actual practice 
we haVe encountered many , instances in which an error in test administration 
which seriously bia«^.item calibrations could^ only be identified by the 
comparison of the direct and confirming linking values in an appropriate 
ne^rk pattern. For this reason we believe the network approach to field 
te»t design -is more practical and defensible than other currently ponular 
procedurfs which average item calibrations without adequate regard 
.for the influence of errors of validity and reliability. 




• • i^DP€ndtx A 

Using th« S«tDt Link in All ThVee Teits of a Triangulation 

# 

(I) AttuM Teati A, B, and C vi€b ItnUng block 1. 




(2) Let- I be the average calibration for the items in block I on test A. 
Similarly for l^ and 1^,. * 



(3) Then: A->B - Ij " Ia 

' ■ A->c - Ic- W , , ^ 

(4) Then- the conflroing value is 

. (A-^B) - (A— >C) + (C-^B) - (Ij, - l^) + (Ij - \) - Ij - \ 



(5) Since the confirming value must Always be equal to th'e direct value 
(A >B)^ the indirect link provides no cdnfirmittg information. 
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